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Abstract 

The heat channel is defined by an analog filter and a subsequent mea- 
surement of the filter output signal perturbed by additive white Gaussian 
noise. The filter is related to the heat kernel of the quantum mechanical 
harmonic oscillator, so the name of the channel. The channel is modeled 
as an infinite-dimensional vector Gaussian channel and the capacity in 
terms of average energy of the input signal is derived along with a method 
of capacity achieving signaling for the continuous-time channel. Then, 
a characterization of the capacity by water-filling in the time-frequency 
plane is stated and proved. We compare our findings with a classical ca- 
pacity result of Gallager. A related problem in rate distortion theory is 
investigated to some extent. Finally, a second formula for the capacity of 
the heat channel based on average energy of the measured perturbed filter 
output signal is derived. The result is interpreted in context of estimation 
theory and a parallel to the I-MMSE relationship due to Guo et al. is 
presented connecting the capacity of the heat channel with an estimation 
error for the output. 



1 Introduction 

The conduction of heat in solid bodies was mathematically described and solved 
by Joseph Fourier in his fundamental 1822 treatise Theorie analytique de la 
chaleur [I]. In one dimension, e.g., in case of a heat-conducting insulated wire, 
his description results in the partial differential equation (the heat equation) 

du 

dt dx'^ ' 

in which u = u{x,t) is temperature at time t > at any point x and k is 
a positive constant depending on the material. Given the initial temperature 
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Figure 1: Spreading of heat/Signal spread because of dispersion in an optical 
fiber (attenuation not regarded), (a) Initial temperature distribution/Fiber in- 
put signal, f{x). (b) Temperature, shortly after/Output signal, short fiber 
{f3 = 10). (c) Temperature, later time/Output signal, longer fiber (/? = 2). 
Arrows indicate direction of spatial propagation of heat (o) or optical intensity 



distribution f{x) for a wire of infinite length, Fourier's solution of the heat 
equation is 

1 f°° (x-„)^ 

^(^'*) = ^r^/ ^ fiy)dy- (1) 

Since, in general, for any time t > the inversion of the integral transform 
appearing in ([T]) is unfeasible in practice [2], we observe an unavoidable loss of 
"information" (in a preliminary, informal sense). In Fig. [U several tempera- 
ture distributions u{x,t) are depicted showing how the initial one is gradually 
smeared out by the propagation of heat. 

A similar situation, in principle known since the earliest days of cable com- 
munication [3], arises in fiber optics. In a transmission through a (single-mode) 
optical fiber, signals experience besides attenuation a spread over time due to 
(chromatic) dispersion (see, e.g., [4], [5]). A frequently used model for disper- 
sion in an optical fiber @] is a linear time-invariant (LTI) filter with impulse 
response (c/. Fig. [2]) 

/ii(t) = ^ e'^T^jw, (2) 

' V2^(l//3) ^ ' 

i.e., a Gaussian filter with the standard deviation 1//3, /? G (0,oo) characteriz- 
ing dispersion (the parametrization is chosen to fit later notation). The fiber 
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input/output relation is now given by the convolution integral 

/oo 
hi{t~t')f{t')dt' , (3) 
-oo 

where / G L^(R) is the finite-energy input signal and u{t) the output, the 
constant factor a S (0,1] representing attenuation. Except for the factor a and 
change of physical dimension from position (variable x) to time (variable t), 
we now observe perfect analogy between Eqs. (H]) and ([3]). As a consequence, 
in Fig. [T] also the degradation of an optical signal — initially a sequence of bits 
(here, binary symbols) obtained by intensity modulation and on-off keying — 
by dispersion in an optical fiber is displayed. Obviously, dispersion limits the 
information throughput of an optical fiber because of intersymbol interference 
(ISI). A maximum attainable bit rate can be estimated by considering as fiber 
input a sequence of unit impulses separated by time intervals of duration T, 
the output then being a sequence of Gaussian pulses of standard deviation 
At = In order to cope with ISI, a popular criterion is At < T/4 [3] 

resulting in our case for the bit rate Rh — 1/T (in binary symbols per second) 
in the estimate 

If the fiber output signal is corrupted by noise, this rule of thumb be- 
comes questionable. In case of additive white Gaussian noise ( AWGN) , typically 
caused by an optical amplifier [4], [6], we arrive at a continuous-time (or wave- 
form) channel following the model in Gallager's 1968 book [7]; see Fig. El^a). 
Here, of course, we supposed that, e.g., the power spectral density (PSD) of 
the AWGN is independent of the input (see the recent paper [8] for an oppo- 
site situation), and the input power is not so high that arising nonlinearities 
[4] would destroy the linear model ([S]); we refer to [5] for a variety of possi- 
ble other perturbations that limit the capacity of optical fiber communication 
systems. Gallager's waveform channel, consisting of an LTI filter and subse- 
quent AWGN (in [7] even nonwhite Gaussian noise is considered), yet may be 
viewed as a generalization of the bandlimited Gaussian channel of Shannon [S] , 
\TU\ . see Fig. [Ifb), because in case of ideal low-pass filters and AWGN the two 
channel models are equivalent. The capacity of the Gallager channel is given 
in the result [71 Theorem 8.5.1], by many referenced as "the" solution to the 
capacity problem [11] . When applied to the above case of a Gaussian LTI filter 
and AWGN, several questions arise in connection with Gallager's result. First, 
capacity is given in parametric form and a closed-form expression remains a 
challenge. Secondly, there is no apparent method for capacity achieving sig- 
naling in IT, Chapter 8]. Finally, as will become evident in the present paper, 
the outcome of Gallager's capacity formula would considerably underestimate 
the attainable capacity (or spectral efficiency) of a communication system con- 
taining the model of a Gaussian filter and subsequent AWGN (possibly among 
other features); actually, addition of certain system components as indicated 
in Fig. mja) would lead to a compound channel of significantly higher spectral 
efficiency. 
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Figure 2: Continuous-time (or waveform) channels, (a) Gallager model [7l Fig- 
ure 8.4.1] (in dashed box); LTI filter with impulse response hi{t) G L^(K). 
(b) Bandlimited Gaussian Channel [S], [10]; here, hi{t) is the impulse response 
of an ideal low-pass filter. Input signal x{t) always power-limited, and noise 
waveform n{t) realization of white Gaussian noise. 



In this paper, we investigate the linear time-varying (LTV) filter (or opera- 
L2(M) L'^(M.) given by 



tor) p'-;'^ 



/3 



\/2tt cosh J 



exp 



2 



t 



cosh 5 



f(t')dt\ (4) 



where t stands for time and a, /3 are any positive numbers satisfying a/3 > 1; the 
positive parameters 7, 5 are defined by 7^ = a/f3, coth(5 = a/?. This operator 
is an instance of a time-frequency localization operator used in signal analysis 
for the approximate concentration of a signal in both time and frequency |12| . 
Operators based on radial Gaussian weights on the time-frequency plane were 
introduced by Daubechies in the seminal paper [T^ ; a generalization to arbitrary 
Gaussian weights leads to the above operator . The Fourier transform of the 
filter output signal g = Pg^^f is 



\/2tt cosh (5 J- 



exp 



2 V cosh 6 



f{J) du', (5) 



where cj is angular frequency and the convention f{uj) = ^= e ^^'^ f{t) dt 



for the Fourier transform has been used [T^ • The condition a/3 > 1 is now seen 
as imposed by the uncertainty principle of communications [TS]. Interestingly 
enough, the kernel of operator P^-"'^^ coincides with the heat kernel [TB] of the 
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quantum mechanical harmonic oscillatorQ Because of the Gaussian prefactor 
on the right-hand side (RHS) of Eq. ([5]), the Fourier transform g{ijj) of the 
filter output decays exponentially outside of the interval [— 7r/3,7r/3] (provided 
that the energy of the input / is not too large). Thus, g may be considered 
an approximately bandlimited signal of approximate bandwidth W = /3/2 in 
positive frequencies measured in hertz. 

If u{t) is the output of the LTI filter ([3]) (where a = 1) with impulse response 
^ upon input /, then the output g of the LTV filter ^ may be written as 

g{t) — e ^ ■ (coshJ) 2y(i/cosh(5). (6) 

Thus, g{t) is just the dilated LTI filter output u{t) multiplied with a Gaussian 
time window; notice that the dilation factor, cosh 5, is close to one when a/? is 
large. In Fig. [H^a) , those two operations are quoted as additional components 
of the heat channel, the latter meaning the continuous-time channel formed by 
the LTV filter (HJ and subsequent AWGN. A precise formal definition of the 
heat channel (whose name is chosen after the aforementioned heat kernel) will 
be given in Section [5] 

The goal of the present paper is to quantify the capacity of the heat channel 
in various ways, to relate the results to other work, and to discuss the implica- 
tions. 

1.1 Related Work 

The computation of the capacity of time-invariant waveform channels, pioneered 
by Holsinger [17] , was put on a firm ground by Gallager who gave a rigorous 
proof for the water-filling characterization in [71 Theorem 8.5.1]. The overall 
approach rests on orthogonal expansion of the channel input, resulting in a dis- 
cretization of the continuous-time channel in form of parallel Gaussian channels 
whose capacity is known; the corresponding discrete water-filling formulas are 
also to be found in [7] (where they are attributed to Ebert [H]). The result is 
then transposed to the frequency domain by means of a specific Szego theorem 
(whose proof in [7] is based on previous Szego type results in [TH], [10] )• The 
quest for water-filling characterizations for the capacity of time-varying chan- 
nels in the time- frequency plane is now an active area of research; see [21] , [22] , 
|23] to cite only a few. A water-filling characterization for time-varying chan- 
nels that are periodic in time has recently been given by Jung [21]. Our Szego 
theorem (Theorem [S]) , based on a Weyl symbol connected with the LTV filter, 
is perhaps closest to the one proved in [23]. The use of Weyl symbols (and 
the related Wigner-Ville spectrum) has a long history in communications; see, 
e.g., [5S], [53], [17]. Concerning reverse water-filling representations for the rate 
distortion function of stationary Gaussian processes in the frequency domain 
we refer to Berger [28] (his approach is also based on [H], [10])- The "C-LLSE 

^In 16 p. 114], the heat kernel of the one dimensional quantum mechanical harmonic 
oscillator with Hamiltonian H = —-^^ + ci^x^ takes the form of the kernel of operator (|4]| 
after the substitution 2at = S, a = 7~^. 
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formula" in Section 15.21 and the ensuing discussion draws inspiration from Guo 
et al. [29] . The I-MMSE relationship in connection with SNR-dependent input 
signals at low SNR is treated in the recent paper [50] . 

1.2 Contribution 

We derive a closed-form expression for the capacity of the heat channel in terms 
of average energy of the input (Theorem [1} and infer a similar expression in 
terms of average input power. Moreover, we present a method of capacity 
achieving signaling — even in form of short pulses when the dispersion parame- 
ter /3 is large enough. The water-filling characterization for the capacity of the 
heat channel in the time- frequency plane (Theorem [2]) is the first clear-cut, full- 
fledged example of its kind for a time-varying channel. Wc find the surprising 
fact that the spectral efficiency of the heat channel in the limiting case a — >■ (X) 
(and SNR not too small) is significantly higher than that of the corresponding 
Gallager channel, i.e., the time-invariant waveform channel with Gaussian LTI 
filter and subsequent AWGN. The rate distortion function in closed form (The- 
orem [3]) is one of the rare examples of such a representation in rate distortion 
theory; the reverse water-filling representation in the time- frequency plane (The- 
orem U) is a novel result for a nonstationary Gaussian process. The (reverse) 
water-filling formulas are based on an new, specific Szego theorem (Theorem [6]) 
for which we give a straightforward, essentially self-contained proof (the only 
external ingredient is the "trace rule" from [24]). We supplement the I-MMSE 
relationship discovered by Guo et al. [2^ by a new "C-LLSE formula" (Propo- 
sition |T|) that connects, in the specific context of the heat channel, increase 
of channel capacity with an estimation error for the channel output (a setting 
where the channel input is necessarily SNR-dependent). The capacity results in 
this paper may serve as lower bounds for the attainable information throughput 
of single-mode optical fiber communication systems in the presence of chromatic 
dispersion and amplifier noise. 

1.3 Organization 

The remainder of the paper is organized as follows. In Section [2l the heat 
channel is defined. In Section |3l a closed formula for the capacity of the heat 
channel is given along with optimal signaling; the water-filling characterization 
of channel capacity in the time-frequency plane is also presented here. A related 
problem in rate distortion theory is investigated in Section |4| Section [5] deals 
with a second closed capacity formula for the heat channel and the relation to 
estimation theory. Finally, Section [6| concludes the paper. 

2 The Heat Channel 

The time- varying channel formed by the LTV filter ([3]) and subsequent AWGN, 
c/. Fig. [2ja) , will now be reduced to an ordered set of parallel Gaussian channels 
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Figure 3: Hermite functions tpoit) (a normalized Gaussian function), 
V'2(i), ipwit) and ^n{t). Strong decay in time is complemented by the 
same behaviour in the frequency domain since Hermite functions are eigenfunc- 
tions of the Fourier transform. 



(or a vector Gaussian channel). We follow the overall approach of where it 
was applied to time-invariant channels. 
From now on we use the parameter 

p = e- J = arccothM) = -l + ^ + .... (7) 
Note that 5 ^ ov 5{al3) = + o(^), as oo^ 
2.1 Diagonalization of the Filter 

As shown in 13: in the radial case a = /?, i.e., 7 = 1 (see [Ij for the general 
case 7 > 0), the operator Pg'^'' in @ has eigenvalues p'^'^i, k = 0, 1, ... , with 
corresponding eigenfunctions 

(D^Vfe)(0 =7"^V'fc(i/7), 

where Vfc(i) = (2''fc!0r)"i/2iJfe(t)e"*'/2 is the fcth Hermite function, Hk{t) = 
e* {—d/dt)'^e~*' being the fcth Hermite polynomial 31 . Since {Djipk', k = 
0, 1, . . . } forms a complete orthonormal basis of L^(M), any function / S L^(R) 
has an expansion f(t) — J^'kLo^^ iD-f'4'k)it) where the coefficient sequence 
xo,xi,... is an element of the space €^(No) of square-summable complex se- 
quences with index set No — {0,1,...}. Hence for any filter input signal 

^We use the standard Landau symbols O(-) ("big-O") and o(-) ("little-o"). 
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/ 6 the filter output signal has the representation 

■DO 

{P'f^fKt) = Y.p'+ix,{D,i^,){t), (8) 

fe=0 

where Xk = {f,D^ipk), {fiJi) = /i(i)/2(t) denoting the mner product 
in L^(R). The new coefficient sequence is (p'^^^xfe)^Q and again an element of 
^^(No). Thus, the fiher P^^^ is reduced to a diagonal linear transformation in 
^'(No). 

In Fig. 131 some Hermite functions are depicted; observe their strong decay 
in time (and frequency). 

2.2 Channel Model 

The noiseless filter output signal g — P^g'^ f G i^(R) is corrupted by AWGN 
of, say, two-sided PSD iVo/2 = 9^ . Upon observation of the noisy filter output 
signal g{t) = g(t) + n(t), where n{t) is a realization of the noise, we want to 
reconstruct the filter input signal / G L^(M) as accurately as possible. In the 
noiseless case, computation of the coefficients yk = {g,D^ipk) ~ p'^^^Xk would 
allow to recover the coefficients Xk,k — 0,1, . . . , thus capturing /. 

In the presence of noise, the best we can do is to measure the coefficients yk by 
optimal detection (due to North |32j). for example by means of a matched filter 
[33] : in our case the LTI filter(s) with impulse response hk{t) = [D^tpk){—t)- 
When applied to the noisy signal g as input, we get for the output at sampling 
instant to = 

/oo /"OO 
hk{to - t'Ygit') dt' = p'^+sxfe + / hk{-t')n{t') dt'. 
-oo J ~oo 

From the theory of LTI filters we know (c/. [71 p. 365]) that the integral on 
the RHS evaluates to a realization of a zero-mean Gaussian random vari- 
able Nk with the variance 6*^ ^fe(^*) dt. Since any waveform D^tpk bas 
norm one, the variance of Nk is 9^ and, thus, does not depend on k. More- 
over, because of orthogonality of the waveforms, the random variables Nk are 
independent. Consequently, the detection errors Uk are realizations of inde- 
pendent identically distributed (i.i.d.) zero-mean Gaussian random variables 
Nk ~ AA(0, 9'^), = 0, 1, . . .. Note that the noise PSD 9'^, measured in watts/Hz, 
has also the dimension of an energy. 

Instead of yk we now have obtained ijk = p'^^'^Xk+nk- So we get the estimate 
Xk = p~''~^yk = Xk + Zk for Xk, where Zk are realizations of independent 
Gaussian random variables Zk ~ Af{0,9'^p~^''~^). Thus, we are led to the 
following definition. 

Definition 1 Let 9 be any positive number and let p e (0, 1) be as in Then 
the heat channel is the infinite- dimensional vector Gaussian channel 

Yk = Xk + Zk, Zk - AA(0, 9^p-^''-^), fc = 0, 1, . . . , (9) 
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Figure 4: Balance of (average) energies around the heat channel: input/output 
energy (£'in/£^out), energy of measurement error (i?crr), and distortion (Sdisti + 
£-(11312 )■ Subchannels displayed at distance S ^ apart as af3 — > oo. 



where the noise Zk is assumed to be independent from subchannel to subchannel. 

The extra factor p"^^ in the noise variances is, of course, of no relevance; for 
the interpretation of 9^ as entropy power of the measurement error we refer to 

m- 

— {Xq, . . . , Xji_i)^ will denote a i^-dimensional column vector, K €N, 
of not necessarily independent random variables Xk- For any S > 0, the vector 
Gaussian channel consisting of the first K subchannels of the heat channel has 
capacity [9], [10] 

C^(5)= max (10) 

E{||X^P}<S 

where /(X^;K^) is the mutual information between input vector X^ and 
output vector , X^ subject to the average energy constraint E{||X^|p} = 
J2k=o^-^k — "^^^ noise variances 9^p~^^~^, k — 0,1,..., are monoton- 
ically increasing and unbounded. Consequently, by reason of the well-known 
water-filling argument (see, e.g., [lOj), for any fixed average input energy S the 
sequence of capacities C^(S'), (7^(5*), . . . eventually becomes constant. We define 

C{S) = hm C'^iS) (11) 

A — foo 

as the capacity of the heat channel; C{S) is measured in bits per channel use (or 
transmission). In Fig.|4l Em = S, the K subchannels to be read in reverse order 
(as will become clear in Section [O]) : further details of Fig. 2] will be described 
in the text. 
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2.3 Example: Dispersion and Amplifier Noise in Fiber Op- 
tics 



When, as supposed here, in the fiber-optic transmission intensity modulation 
is used, real- valued input signals / g i^(E) need to be replaced by waveforms 
''0 + fit) where tq is a fixed positive number chosen large enough so that the 
resulting signals are nonnegative with high probability [cf. Fig. IHJa), below]. 
We model dispersion by an LTI filter with impulse response © . The dispersion 
parameter (3 — I3{L) £ (0, oo) in ([2]) depends on fiber length L as well as the 
coefficient a = a{L) G (0, 1] in ^ representing attenuation. Then, the fiber 
output signal is the waveform aro -t- u{t) where u{t) is given by Eq. ([3]). 

In order to create a heat channel, choose at the receiver any fixed time 
parameter a with the property that a > 1//3. When the product a/3 is large, 
the dilation in ([6]) may be skipped in practice (for simplicity of exposition we 
imagine of having performed the dilation). Now, apply a variable density neutral 
filter of appropriate characteristic (an optical device, see [35]) to effect — in the 
spatial domain — a multiplication of the signal with the time window qa{t) = 
exp[— t^/ (2q!^)] as in (jH]). Next, amplify the obtained signal by factor 1/a. When 
an optical amplifier is employed, the resulting signal is CQqa(t)+g(t) + n{t) where 
Co = ro(cosh<5)-i/2, g = P^-'^f, and n{t) is a realization of white Gaussian noise 
(properly modeling the impairment of an optical signal by an optical amplifier; 
see [B]). After opto electric conversion (possibly adding new white Gaussian 
noise to the signal), remove the known signal component CQqa{t). Finally, use 
as detection device a bank of matched filters {cf. 6j), with impulse responses 
hk{t) as in Section [2.21 k = 0,1, . . . , K — 1, where if is a known number for any 
given average input energy S. 

Thus, we have implemented a heat channel in an optical fiber communication 
system. 

2.4 Degrees of Freedom of Filter Output Signals 

Here, we give an explanation for the time-frequency product a(3 that will occur 
very frequently in the sequel. 

The Wigner-Ville spectrum (WVS) of the response of filter Pg'^'^ on white 
Gaussian noise (see Appendix A for details) is the bivariate function 

*(^'-)=2;^-^^^H"^"^)- ^''^ 

In general, the WVS of a nonstationary stochastic process gives its density of 
(mean) energy in the time-frequency plane; see, e.g., [57], [55]. Consequently, in 
our case, the energy of individual filter output signals would occupy an ellipse- 
shaped region in the time-frequency plane with unsharp boundary. Regarding 
the WVS (IT5|) (after a normalization) as a bivariate Gaussian probability den- 
sity function, we describe this region by an approximation known as ellipse of 
concentration (EoC) in probability theory (36j; the EoC has the property that 
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Figure 5: Energy density and ellipse of concentration (dashed line) for white 
Gaussian noise response of filter in the limiting case a — >■ (3 
(l/\/2)+; graphical appearance typical also for admissible values of a, 13. 

the uniform distribution on it has the same first and second moments as the 
Gaussian distribution. In Fig. [SJ the energy density in the time-frequency plane 
given by the WVS (dU is illustrated. 

In our case, we obtain as EoC the region Ac = e M^; t^/a^ -fcj^//?^ < 

2} with area Ac — 2'Kaf3 [M]. As reported in [121 P- 23], in physics a region in 
phase space (or time-frequency plane such as here) with area A corresponds to 
A/(27r) "independent states" (when A is sufficiently large). In our case, we have 
Ac/(27r) = ap. Accordingly, the time-frequency product a/3 would describe the 
"dimension" of the filter output space (when aj3 is sufficiently large), i.e., the 
degrees of freedom (DoF) of filter output signals. 

3 Channel Capacity in Terms of Channel Input, 
and Optimal Signaling 

In this section, we derive a closed formula for the capacity of the heat channel 
in terms of average energy of the channel input signal along with a method of 
capacity achieving (optimal) signaling. A characterization of channel capacity 
by water-filling in the time-frequency plane is also given. The capacity results 
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will be stated in form of asymptotic equations. 

Definition 2 Any two functions A, B : (l,oo) — ^ M are said to be asymptoti- 
cally equal and we write A = B, if 

iim^i:^l^ = o, 

or equivalently, A{x) — B{x) + o{x) as x ^ oo. 

In our context, x will always be the time-frequency product a/3 > 1. Thus, 
A^ B implies that A(q!/3)/(q:/3) = B{al3)/{al3) + e where e — )■ as a/3 — >• cx). 



3.1 Channel Capacity — First Formula 

The function y — wo{x), x > 0, occurring in the next theorem is the inverse 
function of ?/ = {2x — l)e^^ + 1, a; > (see also Fig. 

Theorem 1 Assume that the average energy S of the input signal depends on 
afi such that S{aj5) — 0(a.j3) as a/S — ^ oo. Then for the capacity (in bits per 
transmission) of the heat channel it holds 



CiS) ^ f 



Wo 



S 



1 2 



los 



(13) 



(a/3/2)02 

Proof: The proof is accomplished by water-filling [TU], [71 Theorem 7.5.1]. Let 
= 9^p^^'^^^, k = 0, 1, ... , be the variance of noise in the fcth subchannel. 
The positive number a is defined by the condition 



oo 



k=Q 



(14) 



fe=0 



where x+ = max{0, x}, x G M., and K — max{fc £ N; < cr^} is the number 
of subchannels in the resulting finite-horizon version of the heat channel. With 
increasing time-frequency product a/3, S = (5(a/3) (now acting as increment) 
tends to so that 



K-l 



S-S = Y.{a'~0'e''^'e')S 



oo 



(^^ _ 0-^e^-)+ dx + e. 



(15) 



where e— >-Oasa/?— J-cxd. Observe that by the growth condition imposed on 
S = S{af3) and because of 5 ~ it holds limsup„^_^3o S ■ 6 < oo so that 
transition to a Riemann integral is allowed; for later reference still note that the 



water level cr^ 
integral yields 



cr (a/3) also remains bounded as a/3 — > oo. Evaluation of the 



at 



■In 



(16) 
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The maximum in Eq. (|10p is achieved when the components Xk of the input 



vector are independent 
becomes 



AA(0, 



C = C(^)= ^-In 1 



fe=0 



and the the capacity (in nats) 



(17) 



Since ct^ eventually remains bounded, transition to a Riemann integral in the 
next equations is allowed and we get 



C-5 = 



^ 2 

fc=0 



;in^ 



2 V 2 6*2 



where e — >■ as a/3 — >■ oo. Thus it holds 



a 

2 



dx + e 



(18) 



Eq. ([16)) is equivalent to 

^2 



— 1 — 

e02 "^ee^ 



where s = S/[{a(3/2)9'^] and ei — )■ as a/3 — oo. By means of the Lambert 
W function [37] (actually its principal branch Wo, see Fig. |6]) which is the 
uniquely determined analytic function satisfying W{x) exp[W{x)] = a; for alia; € 
[— e~^, oo) and W{0) = 0, we get after a computation 



1 /I 



where we have set wo{x) = i[l + W{{x — l)/e)], x > 0, and e[ = eei. 
Because of Eq. (ITOl) , this gives rise to 



K(s+e'i)r 



£2, 



where £2 — > as a/3 — > 00. Unlike Wo(a;), which has a vertical tangent at a; = 0, 
the function {woix))"^ has a continuous and bounded derivative in every closed 
interval [0,so] C [0,oo). Consequently, by the mean value theorem we obtain 
C/(a/3) = i [wo(s)] + e'l + £2 where e'/ + £2 — )■ as a/3 — >• 00. After transition 
from nats to bits (1 nat — log2e bit), Eq. ([T3)) is obtained. □ 
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Figure 6: Lambert W function: branches Wo(x), W^\(x) and related functions 



Remark 1 In \38f the increment 1/K for a similar transition to an integral 
as in il5\) is used (the parameter K in J38}/ corresponds to the number K of 
subchannels in our paper). This would not be an appropriate choice in the above 
proof, since in our case [see also Eq. {24% below] K depends on input energy 
(or SNR). 

Remark 2 In the above proof, the heat channel was reduced to a finite- dimensional 
vector Gaussian channel. For such channels, a coding theorem is available 
Theorem 7.5.2], JlOf . i.e., the capacity C{S) in Eq. |j7[ ) can be achieved by a 
sequence of codes. Therefore, the capacity given by Eq. U3\) has also an opera- 
tional meaning. 

We discuss the case < S — S{a(3) oc a(3 in more detail. First, when 
/3 > is held constant, then S — a/3 5*1 — aP, P = f3Si. Assume that one 
transmission takes time a (in seconds). Forming the limit limQ,_).oo C{S)/a — C 
turns Eq. (jl3p into the true equation 



Wo 



f P 



log2e (bit/s). (19) 



Next, let /? — ?► oo. Since u'o(O) — and {wo{x))'^ is differentiable at x = 
with derivative 1/2, it follows that 



7^2j2log2e (bit/s). (20) 



P 

Finally, we compare (1191) with the capacity of the bandlimited Gaussian 
channel of bandwidth W and one-sided noise PSD Nq given by Shannon's classic 
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SNR 

Figure 7: Spectral efficiencies of heat channel, bandlimited Gaussian channel, 
and Gallager channel as a function of SNR. For small SNR, the spectral efficiency 
of the heat channel is smaller than that of the Gallager channel which cannot 
be seen in the present resolution. 

formula [5] 

C = W\og^{l + ^^ (bit/s). (21) 

Rewrite the latter equation as C(SNR) = W^log2(l + SNR) where SNR = 
P/{WNq). In case of the heat channel, it is consistent to set SNR = P/{WNq) 
where W = /3/2 (c/. Section [J), iVo = 20'^ is the one-sided noise PSD (c/. 
Section lU), and to rewrite Eq. (HH) as C(SNR) = W K (2 • SNR)]^ \og^ e. In 
Fig- 13 the corresponding spectral efficiencies C(SNR)/VF are plotted as a func- 
tion of SNR. As underpinned by Fig. El we observe that the spectral efficiency of 
the heat channel eventually becomes larger than that of the bandlimited Gaus- 
sian channel. On the other hand, the capacity limit in (1201) is exactly the same 
as for a Gaussian channel with infinite bandwidth, average input power P and 
one-sided noise PSD iVo = 29^ {cf., e.g., 10, (9.63)]). 

3.2 Optimal Signaling for the Continuous-Time Channel 

We return to our original, by now refined continuous-time model of the heat 
channel consisting of the LTV filter ^ as its first component, followed by 
AWGN, and a final measurement step. For a fixed average input energy S > 
and noise variance 0^ > 0, the capacity achieving (optimal) input signals are 
now waveforms 

A'-l 

/(t) = ^ a;fe(i?^^fe)(<), teM, (22) 

k=0 
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E^/N„(dB) 

Figure 8: Spectral efficiencies of heat channel, bandlimited Gaussian channel, 
and Gallager channel plotted against 10 log^Q Eh/No; Eh is average input energy 
per bit, A'o one-sided noise PSD of AWGN. In the vicinity of the origin, the 
spectral efficiency of the heat channel is smaller than that of the Gallager channel 
which is hard to be seen in the present resolution. 

where the coefficients Xk are realizations of independent Gaussian random vari- 
ables Xk ~ A/'(0, (7^ — ly'l), k = 0, . . . , K ~ 1, a.s given in the proof of Theorem[TJ 
The corresponding ( "optimal" ) filter output signal is 

3W = I] yfcp7V'fc)(i), ieR, (23) 

k=Q 

where the coefficients yk ~ p^^^x^ are realizations of independent Gaussian 
random variables Yfc - 7V(0, al - 6*2), al = a^p^^+^,k = 0,...,K -1. 

The following numerical example may serve as illustration. In Fig. |9l a pair 
of optimal filter input/output signals is displayed. Since transmission through 
an optical fiber by intensity modulation is supposed, the input signal f{t) is 
modified as described in Section [2T3l By means of Eq. (fT3|). the capacity of the 
concrete heat channel is found to be approximately 29.47 bits per transmission 
(in the given precision, this figure agrees with the capacity found by numerical 
computation). Observe that the input signal f{t) in Fig. [5] is practically of finite 
duration and confined to the time interval [— Tra, 7ra] (or its spatial equivalent) 
as is the filter output signal g{t). Now assume that a signal such as f{t) is 
generated and sent every time a, the average input power then being P = S/a. 
When the ratio x/t for conversion from the temporal (electrical) domain to the 
spatial (optical) domain is 27r m/s, a signal of effective duration of at most 1 
second (as in the example in Fig. [9|) would result in an optical waveform of 
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Figure 9: Optimal waveforms for the heat channel with parameters a = 1 s, /3 = 
100 Hz, 0^ = 0.01 W/Hz, S = 1.0 Ws. (a) Physical input signal tq + f{t), 
To = 2 [no unit], to an optical fiber of length L; intensity modulation assumed, 
(b) Corresponding filter output signal g{t). Gaussian time window enlarged. 
Signals displayed in spatial domain; ratio x/t order of 1 m/s, say. 



length of at most 6.28 meters. Then, given a speed of light of 200, 000 km/s in 
the fiber, an interference of waveforms in the fiber would, of course, be of no 
importance at all. As a consequence, a rate of 29.47 bit/s would be attained; the 
same value is found by Eq. In the present example, K = 64 subchannels 

are needed (as numerically determined). 

Now, let a/3 — >■ oo. Under the assumption of Theorem [l] the number K of 
active subchannels is found to be 

For the sake of simplicity, assume further as above that < S'(a/3) oc a/3. Then, 
by (|15p . we infer that cr^ approaches a finite water level > 6^ as a/3 — >■ cx). As 
a consequence, K —i' oo and — 0^ — 0^ where we have set a-^ = p^^'^^ . 

When is large compared to 0^, the bias 0"^ in a\ — 0"^ may be neglected 
so that the optimal filter output signal (1^^ comes close to a white Gaussian 
noise response; c/. (15^ (below) and Appendix A. Then, the signal model of 
Section is almost met and we may take (and shall do so in the sequel) the 
time-frequency product a/3 as DoF of optimal filter output signals. 
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3.3 Channel Capacity by Water-Filling in the Time-Frequency 
Plane, and a Classic Result of Gallager. 

By means of a Szego theorem {viz. Theorem|6]in Appendix B), the above water- 
filhng solution carries over to the time-frequency plane. The classic water-filling 
solution for the capacity of a power-constraint additive Gaussian noise channel 
goes back to Shannon 39 and has been stated and proved by Gallager [7] in 
full generality. 

3.3.1 Capacity of the Heat Channel by Water-Filling in Time and 
Frequency 

Referring to Gallager's result [H Theorem 8.5.1] in the form given in PH] (9.97)], 
we define 

02 / ^2 ^2 

Nit, uj)^ — - (cosh 5) exp — + — 
Now we are in a position to state 

Theorem 2 Under the same assumption on the average input energy S as in 
Theorem [II the capacity ( in bits per transmission ) of the heat channel is given 
by 

where v is chosen so that 

If [v ~ N[t,oj))+ dtduj. (26) 



Proof: See Appendix B. □ 

Note that the bivariate function N{t,uj) is proportional to the reciprocal 
WVS *(t,w) in (HH). Since N{t,uj) has the form of a "cup," Theorem [5] is a 
water-filling theorem in a very real sense. 



3.3.2 Comparison with Gallager's Result 

When a — > oo, the continuous-time heat channel [cf. Eq. ([6]) and Fig. [Sfa)] 
appears to tend towards an LTI channel according to Gallager's modelf] with 
LTI filter with impulse response It is therefore worthwhile to compare 

Theorem[l]and[2]with Gallager's classic result [7, Theorem 8.5.1] when the latter 
is applied to that particular LTI channel. According to Gallager's theorem, the 
capacity C (now a rate measured in bits per second) at input power P is given 

^In the statement of [71 Theorem 8.5.1] the crucial assumption "T oo" is missing. As a 
consequence, in [7| Figure 8.5.1] the restriction of the input x{t) to a bounded time interval 
(— T/2,T/2) would drop out. By this reason, in Fig.[2la) any time constraint on the input is 
omitted. 
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parametrically by 



C 
P 



logs 



\Hi{f)\''B 



B 



N„/2 
No/2 
\Hi{fW 



df 



df, 



where H^if) = £ 



^^hx{t) dt is the frequency response of the filter and 



is the two-sided noise PSD of the AWGN. For the function hi{t) at hand 
and AWGN with Nq/2 = 9"^ {cf. Section . we obtain after a calculation 



C 



P 



Ni{uj) 



dco 



{1^1 - N,{lo))+ duj, 



(27) 



(28) 



where '^i = ^ is the new parameter, uj — 27: f is the integration variable, and 



(29) 



Ari(w) = — -expl^ 



27r 



We observe formal analogy between the preceding water- filling formulas (P7)) . 

and (HI]), (HH) in Theorem^ Moreover, it holds that N{t,uj) Ni{uj) as 
a — ^ cxD for any t, uj held constant. 

Now, from (|27l) and (|28)) we readily obtain 



C 

PI2 
P 



— (lnr)2log2e bit/s/Hz 

OTT 



(30) 
(31) 



where r = i?/^^ is again a new parameter. With the setting SNR — P/{WNo) 
where W = jS /2, Nq — 29^ (same as in Section IXTl in case of the heat channel) 
we thus obtain a parametric representation of a curve in the C/IV, SNR plane. 
In Fig. [71 the spectral efficiency C /W is plotted as a function of SNR. It came 
as a big surprise to us that the spectral efficiency of the above Gallager channel 
is inferior to that of the heat channel in the limiting case a ^ 00 (as inferred 
from Theorem [1]) — at least when SNR is not too small. A similar behaviour is 
to be observed in Fig. El 

We still remark that it seems elusive to find a closed-form expression from 
Eqs. (PO]) . pil) for C /W as a function of SNR; suffice it to say that, to the best 
of our knowledge, the integral in (1311) cannot be expressed in terms of elementary 
functions [whereas it is easy to evaluate the double integral in (l26l) ]. 
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4 Rate Distortion Function of a Connected Gaus- 
sian Process 



For a well-rounded treatment of the capacity problem for the heat channel it 
is expedient to investigate a dual problem, which is a topic of rate distortion 
theory. To this end, consider the nonstationary zero-mean Gaussian process 
defined by the Karhunen-Loeve expansion 

CO 

x{t) = Y, Xk {D^i>k)(t), t e M, (32) 

fe=0 

where the coefficients Xk, fc = 0, 1, . . . , are independent Gaussian random vari- 
ables ^ A/'(0, cr^) of variance cr^ — cr^p'^'^^^, cr > 0; it is the response of filter 

p'^g'"' on white Gaussian noise [cf. ([5^ in Appendix A]. In Fig. |31 the area 
beneath the curve y = a^e~^^ corresponds to the average energy 

E = 2^^k = J— 2 = (33) 

k=0 ^ 

of the Gaussian process ([5^. The parameter 9^ in Fig. 2] will now have the 
interpretation of a "(ground-)water table;" in the concept of test channel [101 
p. 311], it would have the meaning of noise variance again. 
In this section, information will be measured in nats. 



4.1 Rate Distortion Function in Closed Form 

Substitute the continuous-time Gaussian process {X{t), t G M} in ([5^ by the 
sequence of coefficient random variables X — Xq, Xi, . . . . For an estimate 
X = Xq, Xi, ... of X we take the mean-square error D = ^{^'^LoiXk — Xk)'^} 
as distortion measure. 

The function y — w-i{x), < a; < 1, occurring in the next theorem is 
the inverse function of y = {2x + l)e~^^, x > (see also Fig. [S]). The Landau 
symbol is defined for any two functions as in Definition [2] as follows: A{x) = 
Q{B{x)) as X ^ OD if B{x) > and liminfx^oo A{x)/B{x) > 0. 

Theorem 3 Assume that the foregoing average distortion D depends on ck/3 
such that D{af3) = J7(q;/3) as a/3 — > oo. Then the rate distortion function of the 
Gaussian process \3^) satisfies 



RiD) ^ f 



D 



,(a/3/2)CT^ 

if D < {al3/'2)a'^, and R{D) = otherwise. 



(34) 



Proof: Let E be the average energy p3)) of the Gaussian process (|32]) . 

First, assume D < E. The reverse water-filling argument for a finite number 
of independent Gaussian sources [10] carries over to our case without changes 
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resulting in a finite collection of Gaussian sources Xq, . . . , Xk^i where K = 
max{fc € N; > 9^} and the water level 0^ > is defined by the condition 



D = J2^H9'\ctI}, (35) 

k=Q 

{cf. Fig.H where D = i^disti + -Bdist2)- Consequently, 

OO 

D-S = ^min{02^a2e-2fe^e-^}<5 



fe=o 

/■oo 



mm 



{r,(T'e-'"}dx + e, (36) 







where e — >■ as a/3 — > oo. Observe that by the growth condition imposed on D 
and since i5 ~ the water level 9^ eventually remains above a positive lower 
bound as a/3 — oo. Evaluation of the integral yields 



D 

2 









+ 1 





(37) 



The rate distortion function is parametrically given by [1^ 

K-^ 1 2 

iln^ 
2 02 



^-Ejl-S- (38) 

/c=0 



The RHSs of Eqs. (l38l) and ([TT]) agree. Since ^ is eventually finitely upper 
bounded, transition to a Riemann integral is allowed and we obtain exactly as 
in the proof of ([T8| that 

«^^f^nSV^ (39) 



2 V2 

Eq. (l37l) is equivalent to 

9^ , 0' -1, 
— 5- In — T = -e a + ei, 

where d = D/[{a/3 /2)a'^] and ei — )■ as a/3 — ?► oo. By means of the branch 
W-i of the Lambert W function |37) . which is the uniquely determined an- 
alytic function satisfying W_i(a;) exp[VF_i(x)] = x for all x G [— e~^,0) and 
W-i{—e~^) = —1, W^\{x) — — oo as X — > 0— (see Fig. E]), we get after a 
computation 

1 /I. <t2\' 1 o 



where we have set w^i(x) — \\—\ — a;/e)], < a; < 1, and e'^ ~ —et\. 

Because of Eq. ([39]), this gives rise to 

^ 1 r / , / m2 

— = -[u;_i(d + e'i)] +62, 
ap 2 
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where £2 as a/3 — >■ cjo. Unlike W-i{x), which has a vertical tangent at 
X = 1, the function {w-i{x))'^ has a continuous and bounded derivative in 
any closed interval [do,l] C (0,1]. Consequently, by the mean value theorem 
R/ (a/3) = ^ [w-i (d)]^ + e" + 62 where e" + £2 ^ as a/3 — > 00 uniformly for all 
d G [do,!]. This proves the first part of the theorem. 

Now, suppose D > E. Since ^{J^kLoi^k - 0)^} ^ E < D, the constant 
sequence X — 0, 0, . . . is a sufficient estimate. Since there is no uncertainty 
about the members of that deterministic sequence, no information needs to be 
supplied; thus, R — Q. This proves the second part of the theorem. □ 

Observe that although the functions w^i[x) in Eq. ([M]) and wo{x) in Eq. 
seem to be completely different, they are linked by the two branches of the 
Lambert W function (c/. Fig. [6]). 

4.2 Reverse Water-Filling in the Time-Frequency Plane 

Before continuing with our main theme, we present a parametric representation 
of the rate distortion function occurring in Theorem |3] since the means for its 
proof are now available (c/. Theorem|6]in Appendix B). This representation may 
be viewed as an extension to the time-frequency plane of the classic method of 
reverse water-filling (c/., e.g., [55], [TU]) due to Kolmogorov [30]. It turns out 
that the part of the PSD in [H Theorem 4.5.4] is now taken by the WVS *(t, w) 
in (HD). We obtain 

Theorem 4 The rate distortion function R{D) of the nonstationary Gaussian 
process iSS]) has in the interval < D < (a/3/2)(7^ the parametric representation 

Dx ^ J J mm{X,^{t,u})} dtduj 

Proof: See Appendix B. □ 
Note that 

// ^(t,u})dtduj ^ — = ^cr^ 

JU ^ ' ^ 2 coshJ 2 

is the average energy of the nonstationary Gaussian process ([5^ — as it 
should be. We observe that the representation in Theorem|3]is in perfect analogy 
to the parametric representation 28, (4.5.51), (4.5.52)] of the R{D) function of 
a continuous-time stationary Gaussian source. 

5 Channel Capacity in Terms of Channel Out- 
put, and Relation to Estimation Theory 

Now, we adopt the perspective of the receiver. This will result in a second closed- 
form capacity formula for the heat channel, now in terms of elementary functions 
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and akin to the classic Shannon formula (PT|) for the capacity of a bandlimited 
Gaussian channel. Moreover, we shall find a parallel to the celebrated I-MMSE 
relationship [see Eq. (H^ below] due to Guo et al. [5^. Implications for the 
capacity of multiple-input multiple-output (MIMO) systems will be indicated. 
In the present section, it is convenient to use natural logarithms; therefore, 
unless otherwise stated, information is measured in nats. 

5.1 Channel Capacity — Second Formula 

Recall the representation (l23l) of an optimal filter output signal g. In the next 
theorem, the capacity of the heat channel will be expressed in terms of average 
energy Eout of the measured filter output signal 

K-l 

k=0 

where ijk = yk + nk, yk = p^'^^Xk-, and Xk is as in ([22l) : the measurement 
errors nt are realizations of i.i.d. Gaussian random variables Nk ^ J\f{0,6'^) 
(as in Section \'2.2\i . Notice that here we assumed the model of an inaccurate 
measurement of the coefiicients yk of the noiseless filter output signal (l23l) : we 
shall maintain this model of measurement in Section [5] throughout. 

Theorem 5 Assume that the average energy i?out of the measured filter output 
signal depends on a/3 such that £'out(Q^/3) — 0{aj3) as aj3 oo. Then for the 
capacity (in bits per transmission) of the heat channel it holds 

2 

log2 e. (40) 
Proof: Rewrite Eq. dUl) as 



and set — p^^^^ , fc = 0, 1, . . . . In case of optimal signaling, the average 
energy of the measured filter output signal is Eont — Eout + Ear {cf. Fig. lU 
where i?out = '^^^^i^'i ^ is the average energy of the (noiseless) filter 
output signal, -Ecrr = T^k=o average energy of the measurement error, 

and K is given hy K = maxjfc e N; (t|_i > 6*^} (coinciding with the number K 
of active subchannels in the proof of Theorem [1}. 




Eout 

(a/3/2)6i2 




(a/3/2)6'2 



(nat) 



(41) 
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Since Eout = Y.k=o^ ^1 ^^'^ 5 ^ 



k=0 



rln 







where e — >■ as af3 — )■ oo. Observe that by the growth condition imposed on 
Eontj cr^ remains bounded as a/3 — > oo justifying in return the transition to a 
Riemann integraL Hence, 

Now, Eq. pOj) follows from Eq. (|1T|) and Eq. (after final transition from 

nats to bits). □ 

Note that for the determination of channel capacity by formula pO|) . the 
receiver does not need to know the number K of active subchannels beforehand, 
since, at least in principle, it could easily be estimated as accurately as desired 
from successive optimal channel uses at constant average input energy. 

In the rest of this section we shall always suppose that K is asymptotically 
given by Eq. (p4|) . tacitly assuming that the assumptions of Theorem [1] (or 
Theorem [S]) are fulfilled. 



5.2 Relation to Estimation Theory 

The setting of the previous subsection gives rise to a vector Gaussian channel 

^ H'^X^ + N^, (43) 

where the matrix is the KxK diagonal matrix with entries hkk = p'^'^^ , k = 
0, . . . , K — 1, is a random input vector, a noise vector with random 
components Nk as above, and the resulting output vector. In the language 
of vectors, in Section lOl the vector = (yo, • ■ • , Vk-i)^ was obtained by mea- 
surement of the (noiseless) vector = H^x^ where = (xq, . . . ,xk-i)^ 
was the input. Conversely, we could think of as a perturbed version of 
and aim to estimate from y^ . Linear least squares estimation would result 
in ij^ itself with squared error \\a^ — y^\\^ = J2k=oi^k — VkY ■ Taking the 
expectation, the linear least squares error (LLSE) becomes 

E{||i?^X^' - Y^f} = E{||Af^||2} = Ke^, 

which, of course, agrees with the average energy of the measurement error i?orr = 
12k=o^^ (c/. Fig. [4]). So, we may interprete measurement as an estimation 
problem. 
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5.2.1 C-LLSE formula 



The central result of the paper [21] is an identity connecting mutual information 
with the minimum mean-square error (MMSE) of estimation theory. In case of 
a Gaussian input, this identity reads 

-I{X; y/sm-HX + N) = immse(snr), (44) 



dsnr 2 

where A?" is a noise vector with independent standard Gaussian components, 
independent of the random vector X, E{||X|p} < oo, and ff is a deterministic 
matrix of appropriate dimension. It is interesting to compare Eq. (j44p with the 
capacity calculations in our paper. We shall denote the inverse and transpose 
matrix of by and , respectively. Since mutual information 

is invariant with respect to invertible linear transformations, we infer for the 
mutual information I{X^-Y'^) = I{X^;X^ + H^^N^) occurring in Eq. 
(fTU]) that 

/(X^\ Y"") = I ^H'^X"^ + Af'^) , (45) 

where the noise vector N'^ = 9~^N^ has independent standard Gaussian com- 
ponents, independent of the random vector X'^ = a~^X^ . If we take X'^'^ = 
(X^, . . . , with independent components X^. - Af{0,l- {6'^ /a^)p-'^''-'^), 

where a is determined by Eq. ([14]) or, equivalently, J2k=o ~ i^"^ / '^^) P^^'^^^l ^ 
S/a'^, then the left-hand side of Eq. (|45]) achieves capacity C{S) of the heat 
channel. Since C{S) depends only on the signal-to-noise ratio snr = G 
[1, oo)0 we may write (with slight abuse of notation) 



C(snr) = /(X'" ; VsHrif" X'" + AT'" ), (46) 



which is reminiscent of the mutual information in (|44p . 

Now, several problems arise when trying to take the derivate with respect 
to snr: 1) The probability distribution of the input vector X'^ depends on snr, 
2) The function C(snr) is not difFerentiable at snrs where a new subchannel 
is added (c/. Fig. [TUl) . To overcome both problems, we substitute C(snr) by 
its smooth approximation Co (snr) — ^ (In ^/snr)^ as given by Eq. ([T8]) of 



Section [O] Then we may state what could be called a C-LLSE formula in the 
following theorem. 

Proposition 1 Assume — I and < 9^ < 1. Then it holds that 



dsnr 2 
where llse(snr) is the LLSE at snr — a'^ /O"^ = 1/^ 



Co (snr) = ^llse(snr), (47) 



*Since only the portion — 0^ contributes to the signal, /d^ is rather a signal plus noise 
-to-noise ratio; we stick to the notation "snr" to conform with |29| . 
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snr 



Figure 10: Capacity C(snr) of heat channel, and its approximation Co (snr) in 
case a/? = 5 (for larger values of a/3, the two curves quickly become indistin- 
guishable in the given range of snr). Dotted lines depict snrs where differentia- 
bility of C(snr) breaks down. 



Proof: First, let a, 6 be arbitrary numbers with < 6 < a < oo. Taking the 
derivative of Co (snr) with respect to snr — jO"^ we get, observing Eq. ((24)) . 



d 



1 



■Co(snr) = -— •— In— ^- 



a/3 



1 K^^ 



IE,, 



dsm^"''^'"' 2cr2 2 '""612 2 cr2 2 0-2 ■ 

After rescaling 1,6 /a <t,9 the snr is retained and we obtain (H71) where 
llse(snr) = K{9/(t)'^ <— is the new LLSE at the given snr. □ 

After all, Eq. (gT]) is similar to Eq. (|44l) again! We remark that the LLSE 
in Proposition [1] is given by 

a/3 In snr 



llse(snr) 



(48) 



5.2.2 Comparison of MMSE and LLSE 



To recognize the difference between Eqs. (|44l) and (l47l) . we calculate the MMSE. 
We continue to suppose that snr = a^/d^ > 1 (otherwise, a, 6 may be arbitrary 
positive numbers for the moment). Following [2^, given 



(49) 



the MMSE in estimating H'^X"^is 



mmse(snr) = E 

= tr{i?^(S-^ 



snrJf^^Jf^)-ijf^^} 
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where X'^ is the mininium mean-square estimate of X'^ ^ and is the co- 
variance matrix of X'^ . Since X'^ has independent Gaussian components 
X'f^ ~ Af(0,l — snr^^p^-^'''^^), is a if x if diagonal matrix with entries 
o'kk = 1 — snr^^p^^''^^. k — 0, . . . , K — 1. A computation yields 

mmse(snr) — snr^^ (l — snr^^p^^'^^^) . 

fe=0 

When a/3 becomes large, we obtain by transition to a Riemann integral 

K-l 



mmse(snr) • S = snr ^ (l — snr "^^ ^) 5 

k=0 

/>oo 

'1 / (1 -snr-ie^^)^ dx 
Jo 



snr 



llnsnr ^ ^ 1 



2 snr 2 snr \ snr 

where e — >■ as af3 — >■ oo. Now, returning to the scaling = 1,0 < 9^ < 1, 
Eq. applies and we obtain, observing (5 ~ ^ as a/? — J' oo, 

mmse(snr) = Use (snr) — f ^ J_ \ ^ /gQ\ 

^ ^ ^ ^ 2 snr V snr/ ^ ' 



By averaging with respect to the DoF a/3, the asymptotic equations (|48| . ([50 
turn into true equations and we get for LLSE and MMSE, resp., 

- — , , llse(snr) llnsnr 
lise(snr) — Imi — — , 

a^-i-oo ap 2 snr 

mmse(snr) 

mmse(snr) = lim 

— llse(snr) — ( 1 — 

2 snr \ snr 

In Fig. [11] llse(snr) and mmse(snr) are plotted against lOlogj^pSnr for snr > 

1. 



5.2.3 Application to MIMO Systems 

The scaling assumption in Proposition [1] has only been needed to give the RHS 
of Eq. (|47p the meaning of an estimation error. Without this assumption, it 
always holds that 

-^Co(snr) = i (mmse(snr) + —] \ . (51) 

dsnr ^ ^ 2\ ^ ' 2 snr V snr / J ^ ^ 

Since the channel model ([43|) (or ([49|), cf. [29]) also applies to a MIMO system 
with K transmit and K receive antennas and additive Gaussian noise, Eq. (|5ip 
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0.2 




snr (dB) 

Figure 11: Average energy per degree of freedom of LLSE and MMSE as DoF 
afi — ^ oo. Scaling cr^ = 1, < 0^ < 1 of parameters cr^, 9'^ assmiied. 

may be used to estimate the capacity of certain MIMO systems. The capacity 
of a general MIMO system pS]) has been computed in [41] and, in case of 
a diagonal channel matrix, optimal input power allocation policies are given 
in [42]. However, in contrast to [29], [41], [42], and 03], in our setting the 
probability distribution of optimal input signals depends on snr. Now, Eq. ([CT]) 
tells us that, when a/3 is sufRciently large, with growing snr [thus, by Eq. ((24)) . 
K = ap In Y^snr — > oo] the increase of channel capacity is significantly higher 
than anticipated by the I-MMSE relationship (j44|) . at least in the lower snr 
region ( c/. Fig. Ill|) . The import of this observation is that high-dimensional 
MIMO systems might perform significantly better than hitherto assumed. 

6 Conclusion 

In this paper, we considered the heat channel, in its most basic form a linear 
time-varying analog filter with subsequent AWGN. The representation of the 
filter by means of its orthonormal eigenfunctions (dilated Hermite functions) 
not only leads to a diagonalization of the filter but also allows for a precise 
definition of measurement of the noisy filter output in terms of optimal detec- 
tion. This leads to a discretization of the original continuous-time heat channel 
and results in its final form, an infinite-dimensional vector Gaussian channel 
(where the measurement error is incorporated into the noise variances). Under 
a certain growth condition on the average input energy, a further reduction to 
a finite set of parallel Gaussian channels is possible resulting in an closed-form 
expression for channel capacity in terms of the input energy. We then pro- 
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posed a method for capacity achieving signahng. The capacity formula is an 
asymptotic equation; time averaging yields an equation and capacity (in bits 
per second) as a function of average input power thus allowing a comparison 
with the capacity (or spectral efficiency) of the bandlimited Gaussian channel, 
and of the Gallager channel with Gaussian filter. Surprisingly, the spectral ef- 
ficiency of the Gallager channel (with Gaussian filter) is, in general, inferior to 
that of the heat channel. These findings give evidence that the classical capac- 
ity result [71 Theorem 8.5.1] may lead to an overly conservative assessment of 
the attainable information throughput of communication systems {e.g., in fiber 
optics in the presence of dispersion and amplifier noise) . On the other hand, we 
found that the water-filling characterization of the capacity of the heat channel 
in the time-frequency plane and that of the Gallager channel (with Gaussian 
filter) in the frequency domain compare well. We then computed the rate dis- 
tortion function for white Gaussian noise response of the filter. The Lambert 
W function proved to be an indispensable tool for the statement of closed-form 
channel capacity/rate distortion formulas. Since the Lambert W function is not 
an elementary function, it was noticed that a simpler capacity formula based 
on the measured filter output signal is possible. The new setting was put into 
the framework of estimation theory resulting in a "C-LLSE formula" connect- 
ing the capacity of the heat channel with an estimation error for the output. 
Thus, in a sense, that particular formula can be seen as a parallel to the general 
LMMSE relationship discovered in [52] (where SNR-dependent inputs are pre- 
cluded). The interpretation of our results in context of MIMO systems indicate 
that high-dimensional MIMO systems might perform significantly better than 
assumed. 

Appendix 

A Wigner-Ville spectrum of Filter Response on 
White Gaussian Noise 

We model white Gaussian noise of two-sided noise PSD e (0, oo) by a se- 
quence of stochastic processes {U^{t),t e W}, K = 1,2,..., given by their 
respective Karhunen-Loeve expansion 



where Uq, . . . , Uk-i are i.i.d. Gaussian random variables ^ A/'(0, cr^). For any 
K = 1,2,..., let the process {U^{t),t € R} be the input to filter P^'^^. By 
means of representation ([5]) it is seen that the corresponding filter output tends 
as X oo to the stochastic process 



K-l 



OO 




(52) 



k=Q 
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We interprete {X{t),t G M} as filter response on white Gaussian noise. 

Since any realization x{t) of {X{t)} is almost surely in L^(R), the Wigner 
distribution IH] 



{Wx){t, ^) = ^ (^t + a; - I ) dt' 



may be computed. By taking the ensemble average, we obtain the WVS [IJ of 
process {X{t)}, 



= E{{WX){t,Lu)} 
1 

2^ 



1 /.oo , / f' f'^ 

*r t+-,t--) dt', (53) 



— oo 



where r{ti,t2) — ¥.{X(ti)X(t2)} is the autocorrelation function. The kernel 
of operator p'^^'^ has for arbitrary parameters 7 > 0, (5 > two alternative 
representations (following from a slight generalization of Mehler's formula, see 



Y,p'+HD,^k){x){D^i>k){y) 



k=0 



1 r 1 

: exp ■ 



7\/27rsinh(5 I 47 



coth {^jix- y? + tanh + yf 



We infer by the first representation that r{ti,t2) = a'^p!^\t 1,12). Then, by 



means of the second representation, the integral in (j53p is readily evaluated and 
we obtain for the WVS 



B Proofs of Theorems [2] and [4] 

For any bounded operator A : iy^(K) L'^iM) the Weyl symbol (1^(2;,^) is 
defined by 



{Af){x) = ^ jj^^aA (^-C) e'(^-^>«/(2/)dyrfe, 



(54) 



see, e.g., [H], [IS], [53]; the linear map A 0-^(2:, is called Weyl correspon- 
dence. For example, the operator A — p[^g has the Weyl symbol [TJ 

aA{x,0 = _J_e'(t.n^s)(i-'-'W-e) (55) 
coshd 

1 f x"^ i 



■ exp 



cosh (5 \ 
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In the rest of this appendix, A will always stand for operator P'^g , and 
Afc, fc = 0, 1, . . . , for its eigenvalues p^fc+i g ^q^ ly 'jj-^g proof of the subsequent 
Theorem |6] follows the argument in p4l ( cf. also [45] ) ; the Szego theorems in 
[24] . [45] are inadequate for our purposes, though. 

Lemma 1 For any polynomial Gn{x, z) — ^^=1 Cn{x)z'^ with bounded variable 
coefficients c„(a;) € M, x G (l,oo), it holds 

f2GN{af3,Xk) = Y ff GN{al3,aA{x,0) dxd^. (56) 

Proof: First, by dHJ), for any / e L'^{R) it holds that 

OO 

Gjv(«/3, A)f = GN{aP, Afe)(/, 

fc=0 

Hence, operator B = GAr(a/?, A) has the trace 

oo 

trB = ^Gjv(a/3,Afe). (57) 

Secondly, we use the key observation [24] trace rule (0.4)] to obtain (here 
and thereafter, double integrals extend over K^) 



tiB^-^ J J (TB{x,C)dxdi, 



where asixX) is the Weyl symbol of operator B. By linearity of the Weyl 
correspondence, the Weyl symbol of B has the expansion 



N 



ra=l 



Since for any 7 > held constant the family of operators {P^^^ (5 > 0} forms a 



6 

,(7) 



semigroup with respect to 6 (see [14]), it follows that A" = P2nS- ^'i- f55]) . 
replace operator A by A" and i5 by ni5. Because of tanh(n(5) = (ntanhi5)(l+o(l)) 
we then obtain 



cosh(nd) 



x^ e 



= (l + o(l))(aA(a:,0)"exp 
where the Landau symbol o(l) stands for various quantities vanishing as (5 — ^ 
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(or a/? -T> oo). We now estimate 



tiB = '^11 <^Bix,£.)dxd£, 



^ 1 1 Gn {ctf3,(TA{ax,/3(,)) dxd^ + e 



= ^ jj GN{aP,aA{x,£.))dxdi + eaP, (58) 

where e — >■ as a/3 — ^ oo. Eq. ([55)1 in combination with Eq. (1571) concludes the 
proof. □ 

Theorem 6 (Szego Theorem) Let g : [0, A] M, A e (0, oo), he a con- 
tinuous function such that YitlIx^q^ g{x) / x exists. For any functions a, h : 
(l,oo) — R, where a{x) is bounded and b{x) G [0,A], define the function 
G{x, z) = a{x)g{b{x)z), {x, z) G (1, oo) x [0, 1]. Then it holds 

f2G{af3,Xk) = Y II G(a/3,a^(x,e)) dxd^. (59) 

Proof: The function f{x) = g{x)/x, x £ (0, A], has a continuous extension F(x) 
onto the compact interval [0, A]. By virtue of the Weiertstrass approximation 
theorem, for any n S N there exists a polynomial Fp4^-i{x) of some degree 
— 1 such that \F{x) — Fm„-i{x)\ < tn — for all x € [0, A]. Consequently, 
the polynomial gN„{x) = xFm„-i{x) of degree 7V„ satisfies the inequality 

\g{x)- gNAx)\<tnX,x£[Q,/\]. (60) 

Define the polynomial with variable coefficients Gn^ (x, z) = a{x)gN„ {b{x)z). 
We now show that 



and 



(a/3)-i Gn^ {aP, ^k) -> {aPY^ ^ G(a/3, \k) (61) 

k=0 k=0 

^^T— 1 1 ioiP,aA{x,0) dxdi 

^ ^^^11 G{ap,aA{x,0) dxd^, (62) 

^TT . .hs2 



as 71 — > oo, uniformly for a/3 G (1, oo). 
Proof of {Hp; By Incq. (160]) we get 



^G(a/?,Afc)-f]G^„(a/3,Afe)| < ^ |G(a/3, A^) - G^„ (a/3. A, 

oo 

< Me„A^Afc, 



fc=0 k=0 k=0 



k=0 
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where M = sup{|a(a;)|; x > 1} < oo and J2T=o^f^ ~ P/(l ^ P^) — a/3/2 for 
a/3 > 1. After devision of the last inequality by a/?, convergence in (|6T|) follows 
as claimed. 

Proof of (SW-- Similarly, 



G {a/3,(JA(x,^)) dxd^ — J J Gn„ {al3,crA{x,^)) dxd^\ 
G{al3,aA{x,£,)) - Gn„ (a^, 0-^(3;, ^)) \dxdS, 
< Me„A / / aA{x,^)dxd^. 



< 



Since (27r)^^ JJ aA{x,S,) dx d£^ = p/(l — p^), we arrive at the same conclusion as 
before. 

Now, choose any number n S N, substitute function G in Eq. ([59]) by the 
polynomial Gat^ and devide both sides of that equation by a/3. Let n — >■ c». 
Then, by reason of Lemma [T] and uniform convergence in (pT|) and (p^ with 
respect to a/3 G (1, 00), the theorem follows. □ 

Proof of Theorem Define 

, f maxjO, Inxj if a; > 0, 

l^+'^=\ ifx = 0. 

Because of Eq. pT|) we have, recalling that a'^ is dependent on a/3. 



G{S) 



00 ^ 



A.— 

00 



a^(«/3) 

02 



= ^a(a/3).9(6(a/3)A, 



where a(a^) = 1, 6(a/3) = cr2(a/3)/6'2, ^(x) = ^\n+x,x € [0,A], and A is 
chosen so that b{af3) < A < 00 when af3 is large enough (the latter choice is 
possible since a'^{aP) is finitely upper bounded as a/3 — )• 00). Without loss of 
generality, we assume 6(a/3) e [0, A] for all a/3 € (l,oo). Then, by Theorem[S] 
it follows that 



C{S) 



1 

2^ 
1 

2^ 



In, 



cr2(a/3) 



■In 



92 

27r 



N{x,0 



N{x,0 



dx d^, 



where N{x,^) = |^(a-^(x, ^))~^. Next, rewrite Eq. ([H]) as 



5 = £a2(a/3) 1 



A:=0 



1 

02 - 
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Set a(a/3) = a'^ia^), b{aP) = a^{aP)/9'^ and define 



if x = 0. 



Without loss of generaUty, we assume that a{a/3) is bounded for all a/3 e (1, oo). 
So, 6(a/3) e [0, A] where A ^ s\ip{a(al3)/6'^;af3 > 1} < oo. Then, by Theorem 
[5] it follows that 



Finally, replacement of ^"^-^ by the parameter p completes the proof. □ 

Proof of Theorem^ For any 6 G (0,cr] held constant define the distortion 
D by Eq. ([55)) or, equivalently, by 



oo (2 
I — n ^ 



fe=0 

Since D = E^o a("/^)5(Ka/3)^fe), where a(al3) = 9^, b{a/3) = g{x) 
min{l,x} for x e [0, A], A = it follows by Theorem [5] that 



D = ^ JJ 9^mm^^l,^aA{x,0^ dxd^ 



(63) 



min —,^{t,uj) f dtduj, 



where = ff. cr^(t,cj) is the WVS ([12). Next, rewrite Eq. ([35]) as 



2 ^ V 612 

fc=0 ^ 

Taking a(a/3) — 1, 5(q;/?) = cr^/9^, g{x) — ^ ln+ x, a; G [0,A], A chosen as 
before, we infer by Theorem [S] that 

R = ^ II _\^^+i^<rA{x,i)) dxd^ (64) 



27r 7 2 



2ir 



(it dw. 



Finally, replacement of ^ by the parameter A completes the proof. □ 
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